A Comparison of String Distance Metrics on Usernames for Cross-Platform Identification
ثبت نشده
چکیده
People often use similar usernames across different social media sites. This fact can be used to correlate accounts between different platforms. Since the first mention of this fact in 2009 no research has been done on how to exploit it most efficiently. We showed that ignoring the casing will most definitely improve the matching and we found that Smith-Waterman provides the best metric to match usernames and achieves a success rate of 76%. This implies that earlier work using other string matching metrics could achieve better results by using Smith-Waterman.
منابع مشابه
A Comparison of String Distance Metrics for Name-Matching Tasks
Using an open-source, Java toolkit of name-matching methods, we experimentally compare string distance metrics on the task of matching entity names. We investigate a number of different metrics proposed by different communities, including edit-distance metrics, fast heuristic string comparators , token-based distance metrics, and hybrid methods. Overall, the best-performing method is a hybrid s...
متن کاملA Comparison of String Metrics for Matching Names and Records
We describe an open-source Java toolkit of methods for matching names and records. We summarize results obtained from using various string distance metrics on the task of matching entity names. These metrics include distance functions proposed by several different communities, such as edit-distance metrics, fast heuristic string comparators, token-based distance metrics, and hybrid methods. We ...
متن کاملWord Similarity Calculation by Using the Edit Distance Metrics with Consonant Normalization
Edit distance metrics are widely used for many applications such as string comparison and spelling error corrections. Hamming distance is a metric for two equal length strings and Damerau-Levenshtein distance is a well-known metrics for making spelling corrections through string-to-string comparison. Previous distance metrics seems to be appropriate for alphabetic languages like English and Eur...
متن کاملUsability of String Distance Metrics for Name Matching Tasks in Polish
This paper presents results of the numerous experiments on usability of well-established string distance metrics and some new variants thereof for various name matching tasks in Polish.
متن کاملFBK-HLT: An Effective System for Paraphrase Identification and Semantic Similarity in Twitter
This paper reports the description and performance of our system, FBK-HLT, participating in the SemEval 2015, Task #1 "Paraphrase and Semantic Similarity in Twitter", for both subtasks. We submitted two runs with different classifiers in combining typical features (lexical similarity, string similarity, word n-grams, etc) with machine translation metrics and edit distance features. We outperfor...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017